Background

From the sf vignette:

Simple features or simple feature access refers to a formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects. It also describes how such objects can be stored in and retrieved from databases, and which geometrical operations should be defined for them.

The standard is widely implemented in spatial databases (such as PostGIS), commercial GIS (e.g., ESRI ArcGIS) and forms the vector data basis for libraries such as GDAL. A subset of simple features forms the GeoJSON standard.

R has well-supported classes for storing spatial data (sp) and interfacing to the above mentioned environments (rgdal, rgeos), but has so far lacked a complete implementation of simple features, making conversions at times convoluted, inefficient or incomplete. The package sf tries to fill this gap, and aims at succeeding sp in the long term.

The sf package is an R implementation of Simple Features. This package incorporates:

  • a new spatial data class system in R
  • functions for reading and writing data
  • tools for spatial operations on vectors

Most of the functions in this package starts with prefix st_ which stands for spatial and temporal.

Reading a shapefile

If you’ve used readOGR() from the rgdal package, you’ll notice the similarities in arguments for the st_read() function. We’ll do a quick comparison of the two functions here.

Read in a shapefile of the US West Coast using readOGR() from the rgdal package, and st_read from the sf package.

  • dsn is the path name
  • layer is the name of the file

NOTE: you do not need to add an extension to the layer name

## Read in shapefile using rgdal

system.time(ak_shp_rgdal <- readOGR(dsn="shapefiles", layer="sasap_regions"))
## OGR data source with driver: ESRI Shapefile 
## Source: "/home/sjclark/spatial-analysis-R/shapefiles", layer: "sasap_regions"
## with 13 features
## It has 2 fields
##    user  system elapsed 
##   0.256   0.000   0.258
object.size(ak_shp_rgdal)
## 3046112 bytes
plot(ak_shp_rgdal)

## Read in shapefile using sf
system.time(ak_shp_sf <- st_read(dsn = "shapefiles", layer="sasap_regions")) 
## Reading layer `sasap_regions' from data source `/home/sjclark/spatial-analysis-R/shapefiles' using driver `ESRI Shapefile'
## Simple feature collection with 13 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -2175343 ymin: 405519.1 xmax: 1579226 ymax: 2383770
## epsg (SRID):    NA
## proj4string:    +proj=aea +lat_1=55 +lat_2=65 +lat_0=50 +lon_0=-154 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs
##    user  system elapsed 
##   0.044   0.000   0.045
object.size(ak_shp_sf)
## 2916384 bytes
plot(ak_shp_sf)      

You’ll notice right away that these two objects are being plotted differently. This is because these two objects are of different types.

sf objects usually have two classes - sf and data.frame. Two main differences comparing to a regular data.frame object are spatial metadata (geometry type, dimension, bbox, epsg (SRID), proj4string) and additional column - typically named geom or geometry.

class(ak_shp_sf)
## [1] "sf"         "data.frame"

Coordinate Reference System

Every sf object needs a coordinate reference system (or crs) defined in order to work with it correctly. You can view what crs is set by using the function st_crs

st_crs(ak_shp_sf)
## Coordinate Reference System:
##   No EPSG code
##   proj4string: "+proj=aea +lat_1=55 +lat_2=65 +lat_0=50 +lon_0=-154 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs"

CRS definition strings can be really confusing! One convenient way to reference them quickly is the EPSG code, a number that represents a standard projection and datum. You can check out a list of (lots!) of EPSG codes here.

The string above (even though there is no EPSG code) is actually referencing the Alaska Albers projection. This projection has the EPSG code 3338.

The st_transform function will transform coordinates. Say we want our coordinates as an unprojected WGS84 datum, the EPSG code 4326 will be best for this, so let’s transform our coordinate system to this.

ak_shp_sf <- ak_shp_sf %>%
  st_transform(crs = 4326)

st_crs(ak_shp_sf)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"

Attributes

sf objects can be used as a regular data.frame object in many operations

ak_shp_sf
## Simple feature collection with 13 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -179.2296 ymin: 51.15683 xmax: 179.8567 ymax: 71.43957
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##    region_id           region                       geometry
## 1          1 Aleutian Islands MULTIPOLYGON (((-171.1345 5...
## 2          2           Arctic MULTIPOLYGON (((-139.9552 6...
## 3          3      Bristol Bay MULTIPOLYGON (((-159.8745 5...
## 4          4          Chignik MULTIPOLYGON (((-155.8282 5...
## 5          5     Copper River MULTIPOLYGON (((-143.8874 5...
## 6          6           Kodiak MULTIPOLYGON (((-151.9997 5...
## 7          7         Kotzebue MULTIPOLYGON (((-168.8523 6...
## 8          8        Kuskokwim MULTIPOLYGON (((-172.8928 6...
## 9          9       Cook Inlet MULTIPOLYGON (((-153.3848 5...
## 10        10     Norton Sound MULTIPOLYGON (((-171.3799 6...
nrow(ak_shp_sf)
## [1] 13
ncol(ak_shp_sf)
## [1] 3

sf & dplyr

It’s easy to use the dplyr package on sf objects, just like you would a data.frame. Here are a couple of simple examples:

select()

ak_shp_sf %>%
  select(region, geometry)
## Simple feature collection with 13 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -179.2296 ymin: 51.15683 xmax: 179.8567 ymax: 71.43957
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##              region                       geometry
## 1  Aleutian Islands MULTIPOLYGON (((-171.1345 5...
## 2            Arctic MULTIPOLYGON (((-139.9552 6...
## 3       Bristol Bay MULTIPOLYGON (((-159.8745 5...
## 4           Chignik MULTIPOLYGON (((-155.8282 5...
## 5      Copper River MULTIPOLYGON (((-143.8874 5...
## 6            Kodiak MULTIPOLYGON (((-151.9997 5...
## 7          Kotzebue MULTIPOLYGON (((-168.8523 6...
## 8         Kuskokwim MULTIPOLYGON (((-172.8928 6...
## 9        Cook Inlet MULTIPOLYGON (((-153.3848 5...
## 10     Norton Sound MULTIPOLYGON (((-171.3799 6...

filter()

ak_shp_sf %>%
  filter(region == "Southeast")
##   region_id    region                       geometry
## 1        12 Southeast MULTIPOLYGON (((-133.5994 5...

Joins

You can also use the sf package to create spatial joins, useful for when you want to utilize two datasets together. As an example, let’s ask a question: how many people live in each of these Alaska regions?

We have some population data, but it gives the number of people by city, not by region. To determine the number of people per region we will need to:

  • use a spatial join (st_join) to assign each city to a region
  • use group_by and summarize to calculate the total population by region

First, read in the population data as a regular data.frame.

pop <- read.csv("shapefiles/alaska_poulation.csv")

The st_join function is a spatial left or inner join. The arguments for both the left and right tables are objects of class sf which means we will first need to turn our population data.frame with latitude and longitude coordinates into an sf object.

We can do this easily using the st_as_sf function, which takes as arguments the coordinates and the crs. The remove = F specification here ensures that when we create our geometry column, we retain our original lat lng columns, which we will need later for plotting.

pop_sf <- st_as_sf(pop, 
                  coords = c('lng', 'lat'),
                  crs = 4326,
                  remove = F)

head(pop_sf)
## Simple feature collection with 6 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -176.6581 ymin: 51.88 xmax: -154.1703 ymax: 62.68889
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##   year     city      lat       lng population                   geometry
## 1 2015     Adak 51.88000 -176.6581        122    POINT (-176.6581 51.88)
## 2 2015   Akhiok 56.94556 -154.1703         84 POINT (-154.1703 56.94556)
## 3 2015 Akiachak 60.90944 -161.4314        562 POINT (-161.4314 60.90944)
## 4 2015    Akiak 60.91222 -161.2139        399 POINT (-161.2139 60.91222)
## 5 2015   Akutan 54.13556 -165.7731        899 POINT (-165.7731 54.13556)
## 6 2015 Alakanuk 62.68889 -164.6153        777 POINT (-164.6153 62.68889)

Now we can do our spatial join!

pop_joined_sf <- st_join(pop_sf, ak_shp_sf, join = st_within)

head(pop_joined_sf)
## Simple feature collection with 6 features and 7 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -176.6581 ymin: 51.88 xmax: -154.1703 ymax: 62.68889
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##   year     city      lat       lng population region_id           region
## 1 2015     Adak 51.88000 -176.6581        122         1 Aleutian Islands
## 2 2015   Akhiok 56.94556 -154.1703         84         6           Kodiak
## 3 2015 Akiachak 60.90944 -161.4314        562         8        Kuskokwim
## 4 2015    Akiak 60.91222 -161.2139        399         8        Kuskokwim
## 5 2015   Akutan 54.13556 -165.7731        899         1 Aleutian Islands
## 6 2015 Alakanuk 62.68889 -164.6153        777        13            Yukon
##                     geometry
## 1    POINT (-176.6581 51.88)
## 2 POINT (-154.1703 56.94556)
## 3 POINT (-161.4314 60.90944)
## 4 POINT (-161.2139 60.91222)
## 5 POINT (-165.7731 54.13556)
## 6 POINT (-164.6153 62.68889)

Next we compute the total population for each region

pop_region <- pop_joined_sf %>% 
  group_by(region) %>% 
  summarise(total_pop = sum(population))

And use a regular left_join to get the information back to the Alaska region shapefile.

ak_pop_sf <- left_join(ak_shp_sf, pop_region)

Save

Save the spatial object to disk using st_write() and specifying the filename as well as the driver.

st_write(ak_pop_sf, "shapefiles/ak_regions_population.shp", driver = "ESRI Shapefile", delete_layer = TRUE)
## Deleting layer `ak_regions_population' using driver `ESRI Shapefile'
## Writing layer `ak_regions_population' to data source `/home/sjclark/spatial-analysis-R/shapefiles/ak_regions_population.shp' using driver `ESRI Shapefile'
## features:       13
## fields:         3
## geometry type:  Multi Polygon

Visualize with ggplot

ggplot2 now has integrated functionality to plot sf objects using geom_sf().

#simplest plot
ggplot(ak_pop_sf) +
  geom_sf()

Woah this looks bad!! Remember when we transformed our data from the Alaska Albers projection to an unprojected dataset? Turns out that a projection might be useful here. Let’s create a new object with the Alaska Albers projection for a better display.

ak_pop_sf_aa <- ak_pop_sf %>% st_transform(crs = 3338)

ggplot(ak_pop_sf_aa) +
  geom_sf()

This is useful to make sure your file looks correct but doesn’t display any information about the data. We can plot these regions and fill each polygon based on the population

ggplot(ak_pop_sf_aa) +
  geom_sf(aes(fill = total_pop))

We can clean it up a bit, applying a cleaner theme and assigning a continuous color palette.

ggplot(ak_pop_sf_aa) +
  geom_sf(aes(fill = total_pop)) +
  theme_bw() +
  labs(fill = "Total Population") +
  scale_fill_continuous(low = "khaki", high =  "firebrick", labels = comma)

Visualize with leaflet

We can also make an interactive map using leaflet.

Leaflet (unlike ggplot) will re-project data for you if you give it a projection. The catch is that you have to give it both a projection to project to (like Alaska Albers), and that the coordinates of your shapefile must be referenced the WGS84 datum, and unprojected. This means that we need to use our shapefile with the 4326 EPSG code. Remember you can always check what crs you have set using st_crs.

Here we define a leaflet projection for Alaska Albers, and save it as a variable to use later.

epsg3338 <- leaflet::leafletCRS(
  crsClass = "L.Proj.CRS",
  code = "EPSG:3338",
  proj4def =  "+proj=aea +lat_1=55 +lat_2=65 +lat_0=50 +lon_0=-154 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs",
  resolutions = 2^(16:7))

You’ll notice that the leaflet projection information looks a lot like the crs information from our shapefile that is in Alaska Albers!

st_crs(ak_pop_sf_aa)
## Coordinate Reference System:
##   EPSG: 3338 
##   proj4string: "+proj=aea +lat_1=55 +lat_2=65 +lat_0=50 +lon_0=-154 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"

Here is a simple map:

m <- leaflet(options = leafletOptions(crs = epsg3338)) %>%
        addPolygons(data = ak_pop_sf, 
                    fillColor = "gray",
                    weight = 1)

m

A slightly more complex one:

pal <- colorNumeric(palette = "Reds", domain = ak_pop_sf$total_pop)

m <- leaflet(options = leafletOptions(crs = epsg3338)) %>%
        addPolygons(data = ak_pop_sf, 
                    fillColor = ~pal(total_pop),
                    weight = 1,
                    color = "black",
                    fillOpacity = 1,
                    label = ~region) %>% 
        addLegend(position = "bottomleft",
                  pal = pal,
                  values = range(ak_pop_sf$total_pop),
                  title = "Total Population")

m

Even more bells and whistles!

pal <- colorNumeric(palette = "Reds", domain = ak_pop_sf$total_pop)

m <- leaflet(options = leafletOptions(crs = epsg3338)) %>%
        addPolygons(data = ak_pop_sf, 
                    fillColor = ~pal(total_pop),
                    weight = 1,
                    color = "black",
                    fillOpacity = 1) %>% 
        addCircleMarkers(data = pop_sf,
                         lat = ~lat,
                         lng = ~lng,
                         radius = ~log(population/500), # arbitrary scaling
                         fillColor = "gray",
                         fillOpacity = 1,
                         weight = 0.25,
                         color = "black",
                         label = ~paste0(pop_sf$city, ", population ", comma(pop_sf$population))) %>%
        addLegend(position = "bottomleft",
                  pal = pal,
                  values = range(ak_pop_sf$total_pop),
                  title = "Total Population")

m

There is a lot more functionality to sf including the ability to intersect polygons, calculate distance, create a buffer, and more. Here are some more great resources and tutorials for a deeper dive into this great package:

Spatial analysis in R with the sf package
Intro to Spatial Analysis
sf github repo
Tidy spatial data in R: using dplyr, tidyr, and ggplot2 with sf
mapping-fall-foliage-with-sf